# Multi-frame Processing
Tinyllava Video Qwen2.5 3B Group 16 512
Apache-2.0
TinyLLaVA-Video is a video understanding model based on Qwen2.5-3B and siglip-so400m-patch14-384, utilizing a grouped resampler for video frame processing
Video-to-Text
T
Zhang199
76
0
Xgen Mm Vid Phi3 Mini R V1.5 128tokens 8frames
xGen-MM-Vid (BLIP-3-Video) is an efficient compact vision-language model equipped with an explicit temporal encoder, specifically designed for video content understanding.
Video-to-Text
Safetensors English
X
Salesforce
398
11
Featured Recommended AI Models